A Note on M. N. Katehakis' and Y.-R. Chen's Computation of the Gittins Index
نویسنده
چکیده
In a recent paper Katehakis and Chen propose a sequence of linear programs for the computation of the Gittins indices. If there are N' projects and project c has K^ states, then 5)*-1 K,. linear prc^rams have to be solved. In this note it is shown that instead of the K,, linear programs for project c also one parametric linear program with the same dimensions can be solved. 1. Introduction. In a recent paper Katehakis and Chen (1984) propose a sequence of linear programs for the computation of the Gittins indices. In this note we show a computationally more favorable approach by using parametric linear programming. Wherever possible, Katehakis and Chen's notation is followed. Consider the following version of the multi-armed bandit problem. There are N projects and project t; is at each instant of time in one of the states of the set • Sp == {1,2,.. ., A;^). After observing the states of each project, one project must be selected to work on. If project v is selected at time t and the state of the project is state i, then a reward R^ii) is earned and p^ij\i) denotes the probability that the state of project V is state y at the next instant of time (the states of the unselected projects are unchanged). The problem is to find a rule for selecting the project such that the expected total a-discounted rewards for a discoimt factor a E [0,1) is maximized. Gittins and his co-workers have shown the existence of numbers M^ii), 1 < / < K^, \ < V < N, such that if at time point / project o is in state x^it), I < v < N, an optimal rule is to select project t;*, where
منابع مشابه
Computing Optimal Sequential Allocation Rules in Clinical Trials*
Michael N. Katehakis State University of New York at Stony Brook and Cyrus Derman Columbia University The problem of assigning one of several treatments in clinical trials is formulated as a discounted bandit problem that was studied by Gittins and Jones. The problem involves comparison of certain state dependent indices A recent characterization of the index is used to calculate more efficient...
متن کاملLinear Functions Preserving Sut-Majorization on RN
Suppose $textbf{M}_{n}$ is the vector space of all $n$-by-$n$ real matrices, and let $mathbb{R}^{n}$ be the set of all $n$-by-$1$ real vectors. A matrix $Rin textbf{M}_{n}$ is said to be $textit{row substochastic}$ if it has nonnegative entries and each row sum is at most $1$. For $x$, $y in mathbb{R}^{n}$, it is said that $x$ is $textit{sut-majorized}$ by $y$ (denoted by $ xprec_{sut} y$) if t...
متن کاملQ-Learning for Bandit Problems
Multi-armed bandits may be viewed as decompositionally-structured Markov decision processes (MDP's) with potentially very large state sets. A particularly elegant methodology for computing optimal policies was developed over twenty ago by Gittins Gittins & Jones, 1974]. Gittins' approach reduces the problem of nding optimal policies for the original MDP to a sequence of low-dimensional stopping...
متن کاملA note on convergence in fuzzy metric spaces
The sequential $p$-convergence in a fuzzy metric space, in the sense of George and Veeramani, was introduced by D. Mihet as a weaker concept than convergence. Here we introduce a stronger concept called $s$-convergence, and we characterize those fuzzy metric spaces in which convergent sequences are $s$-convergent. In such a case $M$ is called an $s$-fuzzy metric. If $(N_M,ast)$ is a fuzzy metri...
متن کاملRestart Probability Model
We discuss a new applied probability model: there is a system whose evolution is described by a Markov chain (MC) with known transition matrix on a discrete state space and at each moment of a discrete time a decision maker can apply one of three possible actions: continue, quit, and restart MC in one of a finite number of fixed “restarting” points. Such a model is a generalization of a model d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Math. Oper. Res.
دوره 11 شماره
صفحات -
تاریخ انتشار 1986